Meta Inverse Reinforcement Learning via Maximum Reward Sharing for Human Motion Analysis
نویسندگان
چکیده
This work handles the inverse reinforcement learning (IRL) problem where only a small number of demonstrations are available from a demonstrator for each highdimensional task, insufficient to estimate an accurate reward function. Observing that each demonstrator has an inherent reward for each state and the task-specific behaviors mainly depend on a small number of key states, we propose a meta IRL algorithm that first models the reward function for each task as a distribution conditioned on a baseline reward function shared by all tasks and dependent only on the demonstrator, and then finds the most likely reward function in the distribution that explains the task-specific behaviors. We test the method in a simulated environment on path planning tasks with limited demonstrations, and show that the accuracy of the learned reward function is significantly improved. We also apply the method to analyze the motion of a patient under rehabilitation.
منابع مشابه
Inverse Optimal Control
In Reinforcement Learning, an agent learns a policy that maximizes a given reward function. However, providing a reward function for a given learning task is often non trivial. Inverse Reinforcement Learning, which is sometimes also called Inverse Optimal Control, addresses this problem by learning the reward function from expert demonstrations. The aim of this paper is to give a brief introduc...
متن کاملReinforcement Learning from Demonstration and Human Reward
In this paper, we proposed a model-based method—IRL-TAMER— for combining learning from demonstration via inverse reinforcement learning (IRL) and learning from human reward via the TAMER framework. We tested our method in the Grid World domain and compared with the TAMER framework using different discount factors on human reward. Our results suggest that with one demonstration, although an agen...
متن کاملThe Use of Apprenticeship Learning Via Inverse Reinforcement Learning for Generating Melodies
The research presented in this paper uses apprenticeship learning via inverse reinforcement learning to ascertain a reward function in a musical context. The learning agent then used this reward function to generate new melodies using reinforcement learning. Reinforcement learning is a type of unsupervised machine learning where rewards are used to guide an agent’s learning. These rewards are u...
متن کاملLarge-Scale Inverse Reinforcement Learning via Function Approximation for Clinical Motion Analysis
This paper introduces a new method for inverse reinforcement learning in large-scale and high-dimensional state spaces. To avoid solving the computationally expensive reinforcement learning problems in reward learning, we propose a function approximation method to ensure that the Bellman Optimality Equation always holds, and then estimate a function to maximize the likelihood of the observed mo...
متن کاملInverse Reinforcement Learning in Large State Spaces via Function Approximation
This paper introduces a new method for inverse reinforcement learning in large-scale and high-dimensional state spaces. To avoid solving the computationally expensive reinforcement learning problems in reward learning, we propose a function approximation method to ensure that the Bellman Optimality Equation always holds, and then estimate a function to maximize the likelihood of the observed mo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1710.03592 شماره
صفحات -
تاریخ انتشار 2017